Search Result

Select

Micro blog user recommendation algorithm based on similarity of multi-source information

YAO Binxiu, NI Jiancheng, YU Pingping, LI Linlin, CAO Bo

Journal of Computer Applications 2017, 37 (5): 1382-1386. DOI: 10.11772/j.issn.1001-9081.2017.05.1382

Abstract （503）

PDF （872KB）（479）

Save

Focusing on the data sparsity and low accuracy of recommendation existed in traditional Collaborative Filtering (CF) recommendation algorithm, a micro blog User Recommendation algorithm based on the Similarity of Multi-source Information, named MISUR, was proposed. Firstly, the micro blog users were classified by K-Nearest Neighbor ( KNN) algorithm according to their tag information. Secondly, the similarity of the multi-source information, such as micro blog content, interactive relationship and social information, was calculated for each user in each class. Thirdly, the time weight and the richness weight were introduced to calculate the total similarity of multi-source information, and the TOP- N recommendation was used in a descending order. Finally, the experiment was carried out on the parallel computing framework Spark. The experimental results show that, compared with CF recommendation algorithm and micro blog Friend Recommendation algorithm based on Multi-social Behavior (MBFR), the superiority of the MISUR algorithm is validated in terms of accuracy, recall and efficiency.

Reference | Related Articles | Metrics

Select

Weighted Slope One algorithm based on clustering and Spark framework

LI Linlin, NI Jiancheng, YU Pingping, YAO Binxiu, CAO Bo

Journal of Computer Applications 2017, 37 (5): 1287-1291. DOI: 10.11772/j.issn.1001-9081.2017.05.1287

Abstract （743）

PDF （928KB）（474）

Save

In view of that the traditional Slope One algorithm does not consider the influence of project attribute information and time factor on project similarity calculation, and there exists high computational complexity and slow processing in current large data background, a weighted Slope One algorithm based on clustering and Spark framework was put forward. Firstly, the time weight was added to the traditional item score similarity calculation, and comprehensive similarity was computed with the similarities of the item attributes. And then the set of nearest neighbors was generated through combining with the Canopy- K-means algorithm. Finally, the data was partitioned and iterated to realize parallelization by Spark framework. The experimental results show that the improved algorithm based on the Spark framework is more accurate than the traditional Slope One algorithm and the Slope One algorithm based on user similarity, which can improve the operating efficiency by 3.5-5 times compared with the Hadoop platform, and is more suitable for large-scale dataset recommendation.

Reference | Related Articles | Metrics

Select

Highly efficient Chinese text classification algorithm of KNN based on Spark framework

YU Pingping, NI Jiancheng, YAO Binxiu, LI Linlin, CAO Bo

Journal of Computer Applications 2016, 36 (12): 3292-3297. DOI: 10.11772/j.issn.1001-9081.2016.12.3292

Abstract （756）

PDF （936KB）（486）

Save

The time complexity of K-Nearest Neighbor( KNN) classification algorithm is proportional to the number of training samples, which needs a large number of computation, and the bottleneck of slow processing exists in traditional architecture under the big data background. In order to solve the problems, a highly efficient algorithm of KNN based on Spark framework and clustering was proposed. Firstly, the training set was cut twice by the optimized K-medoids algorithm through introducing constriction factor. Then the K was iterated constantly in the process of classification and the classification result was obtained. And the data was partitioned and iterated to realize parallelization combining the Spark framework in the calculation. The experimental results show that, the classification time of the traditional KNN algorithm and the KNN algorithm based on K-medoids is 3.92-31.90 times of the proposed algorithm in different datasets. The proposed algorithm has high computational efficiency and better speedup ratio than KNN based on Hadoop platform, and it can effectively classify the big data.

Reference | Related Articles | Metrics

Select

Discovery method of travelling companions based on big data of license plate recognition

CAO Bo, HAN Yanbo, WANG Guiling

Journal of Computer Applications 2015, 35 (11): 3203-3207. DOI: 10.11772/j.issn.1001-9081.2015.11.3203

Abstract （886）

PDF （783KB）（776）

Save

The discovery of travelling companions based on processing and analysis of the license plate recognition big data has become widely used in many aspects such as the involved vehicle tracking. However, discovery algorithms of travelling companions have poor performance in single machine mode no matter in time and space. To solve this problem, a discovery method of travelling companions named FP-DTC was proposed. This method based on the algorithm of FP-Growth was parallelled by the distributed processing framework-Spark, and had made some improvement and optimization to discover the travelling companions more efficiently. The experimental results show that, this method performs well on the discovery of travelling companions, and achieves an increase of nearly four times than the same algorithm with Hadoop.

Reference | Related Articles | Metrics